Codebook Design for Speech Guided Car Infotainment Systems
Identifieur interne : 000623 ( Main/Exploration ); précédent : 000622; suivant : 000624Codebook Design for Speech Guided Car Infotainment Systems
Auteurs : Martin Raab [Allemagne] ; Rainer Gruhn [Allemagne] ; Elmar Noeth [Allemagne]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2008.
English descriptors
- Teeft :
- Additional gaussians, Additional language, Additional languages, Algorithm, Baseline, City names, Codebook, Codebook design, Codebooks, Database, English codebook, Experimental setup, Foreign names, Future work, Gaussians, German codebook, Gruhn, Hiwire, Hiwire data, Hiwire database, Human input, Infotainment, Infotainment scenario, Infotainment systems, Initial codebooks, Main language, Main language codebook, Main language performance, Maximum accuracy, Multilingual, Multilingual input, Multilingual recognition, Multilingual speech recognition, Multiple languages, Music titles, Mwcs, Native english codebook, Native speech, Nearest neighbor connections, Nonnative speech, Other words, Quantization, Raab, Results show, Same time, Sound patterns, Speech recognition, Speech recognizers, Such collections, Such systems, Training samples, Vector quantization, Word accuracies.
Abstract
Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.
Url:
DOI: 10.1007/978-3-540-69369-7_6
Affiliations:
- Allemagne
- Bade-Wurtemberg, Bavière, District de Moyenne-Franconie, District de Tübingen
- Erlangen, Ulm
- Université d'Ulm
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001240
- to stream Istex, to step Curation: 001151
- to stream Istex, to step Checkpoint: 000447
- to stream Main, to step Merge: 000623
- to stream Main, to step Curation: 000623
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author><name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
</author>
<author><name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
</author>
<author><name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1007/978-3-540-69369-7_6</idno>
<idno type="url">https://api.istex.fr/document/B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001240</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001240</idno>
<idno type="wicri:Area/Istex/Curation">001151</idno>
<idno type="wicri:Area/Istex/Checkpoint">000447</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000447</idno>
<idno type="wicri:doubleKey">0302-9743:2008:Raab M:codebook:design:for</idno>
<idno type="wicri:Area/Main/Merge">000623</idno>
<idno type="wicri:Area/Main/Curation">000623</idno>
<idno type="wicri:Area/Main/Exploration">000623</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Codebook Design for Speech Guided Car Infotainment Systems</title>
<author><name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
<affiliation wicri:level="3"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
<placeName><region type="land" nuts="1">Bade-Wurtemberg</region>
<region type="district" nuts="2">District de Tübingen</region>
<settlement type="city">Ulm</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
<placeName><settlement type="city">Erlangen</settlement>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Moyenne-Franconie</region>
</placeName>
</affiliation>
<affiliation></affiliation>
</author>
<author><name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<affiliation wicri:level="3"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Harman Becker Automotive Systems, Speech Dialog Systems, Ulm</wicri:regionArea>
<placeName><region type="land" nuts="1">Bade-Wurtemberg</region>
<region type="district" nuts="2">District de Tübingen</region>
<settlement type="city">Ulm</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="4"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Information Technology, University of Ulm, Ulm</wicri:regionArea>
<placeName><region type="land" nuts="1">Bade-Wurtemberg</region>
<region type="district" nuts="2">District de Tübingen</region>
<settlement type="city">Ulm</settlement>
</placeName>
<orgName type="university">Université d'Ulm</orgName>
</affiliation>
</author>
<author><name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<affiliation wicri:level="3"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Dept. of Pattern Recognition, University of Erlangen, Erlangen</wicri:regionArea>
<placeName><settlement type="city">Erlangen</settlement>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Moyenne-Franconie</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2008</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Additional gaussians</term>
<term>Additional language</term>
<term>Additional languages</term>
<term>Algorithm</term>
<term>Baseline</term>
<term>City names</term>
<term>Codebook</term>
<term>Codebook design</term>
<term>Codebooks</term>
<term>Database</term>
<term>English codebook</term>
<term>Experimental setup</term>
<term>Foreign names</term>
<term>Future work</term>
<term>Gaussians</term>
<term>German codebook</term>
<term>Gruhn</term>
<term>Hiwire</term>
<term>Hiwire data</term>
<term>Hiwire database</term>
<term>Human input</term>
<term>Infotainment</term>
<term>Infotainment scenario</term>
<term>Infotainment systems</term>
<term>Initial codebooks</term>
<term>Main language</term>
<term>Main language codebook</term>
<term>Main language performance</term>
<term>Maximum accuracy</term>
<term>Multilingual</term>
<term>Multilingual input</term>
<term>Multilingual recognition</term>
<term>Multilingual speech recognition</term>
<term>Multiple languages</term>
<term>Music titles</term>
<term>Mwcs</term>
<term>Native english codebook</term>
<term>Native speech</term>
<term>Nearest neighbor connections</term>
<term>Nonnative speech</term>
<term>Other words</term>
<term>Quantization</term>
<term>Raab</term>
<term>Results show</term>
<term>Same time</term>
<term>Sound patterns</term>
<term>Speech recognition</term>
<term>Speech recognizers</term>
<term>Such collections</term>
<term>Such systems</term>
<term>Training samples</term>
<term>Vector quantization</term>
<term>Word accuracies</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input. In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization. We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.</div>
</front>
</TEI>
<affiliations><list><country><li>Allemagne</li>
</country>
<region><li>Bade-Wurtemberg</li>
<li>Bavière</li>
<li>District de Moyenne-Franconie</li>
<li>District de Tübingen</li>
</region>
<settlement><li>Erlangen</li>
<li>Ulm</li>
</settlement>
<orgName><li>Université d'Ulm</li>
</orgName>
</list>
<tree><country name="Allemagne"><region name="Bade-Wurtemberg"><name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
</region>
<name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<name sortKey="Gruhn, Rainer" sort="Gruhn, Rainer" uniqKey="Gruhn R" first="Rainer" last="Gruhn">Rainer Gruhn</name>
<name sortKey="Noeth, Elmar" sort="Noeth, Elmar" uniqKey="Noeth E" first="Elmar" last="Noeth">Elmar Noeth</name>
<name sortKey="Raab, Martin" sort="Raab, Martin" uniqKey="Raab M" first="Martin" last="Raab">Martin Raab</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000623 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000623 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Sarre |area= MusicSarreV3 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:B0900FEE6C7C3D6AD35E4498DC98E585F9B042DC |texte= Codebook Design for Speech Guided Car Infotainment Systems }}
This area was generated with Dilib version V0.6.33. |